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Abstract. Many authors have studied the phenomenon of typicahy Gaussian marginals 
of high-dimensional random vectors; e.g., for a probability measure on R'', under mild 
conditions, most one-dimensional marginals are approximately Gaussian if d is large. In 
earlier work, the author used entropy techniques and Stein's method to show that this 
phenomenon persists in the bounded-Lipschitz distance for fc-dimensional marginals of d- 
dimensional distributions, if fc = o{y^\og{d)). In this paper, a somewhat different approach 
is used to show that the phenomenon persists if /c < ]^^§^j;yjj and that this estimate is 
best possible. 



1. Introduction 

The exphcit study of typical behavior of the margins of high- dimensional probability 
measures goes back to Sudakov [21 , although some of the central ideas appeared much 
earlier; e.g., the 1906 monograph [2] of Borel, which contains the first rigorous proof that 
projections of uniform measure on the n-dimensional sphere are approximately Gaussian 
for large n. Subsequent major contributions were made by Diaconis and Freedman [3], von 
Weizsacker Bobkov [I], and Klartag [S], among others. The objects of study are a 
random vector X G M*^ and its projections onto subspaces; the central problem here is to 
show that for most subspaces, the resulting distributions are about the same, approximately 
Gaussian, and moreover to determine how large the dimension k of the subspace may be 
relative to d for this phenomenon to persist. This aspect in particular of the problem was 
addressed in earlier work [lOj of the author. In this paper, a different approach is presented to 
proving the main result of [10], which, in addition to being technically simpler and perhaps 
more geometrically natural, also gives a noticable quantiative improvement. The result 
shows that the phenomenon of typical Gaussian marginals persists under mild conditions for 
k < logQogl'j-)) , as opposed to the results of [10], which requires k = o{^y\og{d)) (note that a 
misprint in the abstract of that paper claimed that k = o{log{d)) was sufficient). 

The fact that typical A;-dimensional projections of probability measures on M"^ are approxi- 
mately Gaussian when k < , ^l^^^fL can be viewed as a measure-theoretic version of a famous 

log(log(d)) 

theorem of Dvoretzky [5], V. Milman's proof of which [I2j shows that for e > fixed and X a 
(i-dimensional Banach space, typical A;-dimensional subspaces E C X are (1 -|- e)-isomorphic 
to a Hilbert space, if A; < C(e) log{d). (This is the usual formulation, although one can give a 
dual formulation in terms of projections and quotient norms rather than subspaces.) These 
results should be viewed as analogous, in the following sense: in both cases, an additional 
structure is imposed on R" (a norm in the case of Dvoretzky's theorem; a probability mea- 
sure in the present context); in either case, there is a particularly nice way to do this (the 
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Euclidean norm and the Gaussian distribution, respectively). The question is then: if one 
projects an arbitrary norm or probability measure onto lower dimensional subspaces, does 
it tend to resemble this nice structure? If so, by how much must one reduce the dimension 
in order to see this phenomenon? 

Aside from the philosophical similarity of these results, they are also similar in that addi- 
tional natural geometric assumptions lead to better behavior under projections. The main 
result of Klartag p] shows that if the random vector X G M*^ is assumed to have a log-concave 
distribution, then typical marginals of the distribution of X are approximately Gaussian even 
when k = d!^ (for a specific universal constant e G (0,1)). This should be compared in the 
context of Dvoretzky's theorem to, for example, the result of Figiel, Lindenstrauss and V. 
Milman [S] showing that if a d-dimensional Banach space X has cotype q G [2,oo), then 

2 

X has subspaces of dimension of the order which are approximately Euclidean; or the 
result of Szarek [TS] showing that if X has bounded volume ratio, then X has nearly Eu- 
clidean subspaces of dimension |. One interesting difference in the measure-theoretic context 
from the classical context is that, for measures, it is possible to determine which subspaces 
have approximately Gaussian projections under symmetry assumptions on the measure (see 
M. Meckes [H]); there is no known method to find explicit almost Euclidean subspaces of 
Banach spaces, even under natural geometric assumptions such as symmetry properties. 

Following the statements of the main results below, an example is given to show that the 
estimate k < , is best possible in the metric used here. 

log(log{d)) t- 

Before formally stating the results, some notation and context are needed. The Stiefel 
manifold Wd,k is defined by 



r 1 1/2 

with metric p{9, ff) = Y/j=i l^j - ^jP • The manifold Wd,k posseses a rotation-invariant 



2nd,fc := {9 = (01, . . . ,0,) : 0, G {9,,,e,) = S,,\fl < ij < k}, 

(Haar) probability measure. 

Let X be a random vector in and let 9 G Wd,k- Let 

Xe■.= {{X,9^),...,{X,9k))■, 

that is, Xg is the projection of X onto the span of 9. Consider also the "annealed" version 
Xq for G G SUrf^fc distributed according to Haar measure and independent of X. The no- 
tation Ex[-] is used to denote expectation with respect to X only; that is, Kx[f{X,Q)] = 
E [f{X, 6) I©] • When Xq is being thought of as conditioned on B with randomness coming 
from X only, it is written Xg. The following results describe the behavior of the random 
variables Xg and Xq. In what follows, c and C are used to denote universal constants which 
need not be the same in every appearance. 

Theorem 1. Let X be a random vector in M", with F.X = 0, E[|Xp] = a'^d, and let 
A := E||Xpcr^^ — d\. If Q is a random point ofWd,k, Xq is defined as above, and Z is a 
standard Gaussian random vector, then 

a[Vk{A + l) + k] 

dBL(-^e-crZ) < — . 

d — 1 

Theorem 2. Let Z be a standard Gaussian random vector. Let 

B := sup E(X,0^ 
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For 9 e Wd,k, let 
dBL{Xe,(yZ) 



sup 

max{||/||oo,|/|L)<l 



E 



that is, dBL{Xg,(jZ) is the conditional hounded- Lip schitz distance from Xq to aZ, condi- 
tioned on O. Then ifF^^k denotes the Haar measure on Wd,k, 

Fd,k [0 : \dBL{Xe,aZ)-EdBL{Xe,aZ)\ > e] < Ce""^. 
Theorem 3. With notation as in the previous theorems, 

(Y ^r,,^r^\ {kB + Blogid))B^^ , a[^(A + l) + k] 
iiLdBL[X0,aZ) < C 2 \ — 

In particular, under the additional assumptions that A < C'y/d and B = 1, then 

k^d^k+t 

Remark: The assumption that i? = 1 is automatically satisfied if the covariance matrix of 
X is the identity; in the language of convex geometry, this is simply the case that the vector 
X is isotropic. The assumption that A = 0{y/d) is a geometrically natural one which arises, 
for example, if X is distributed uniformly on the isotropic dilate of the ii ball in M.'^. 

Together, Theorems |2] and [3] give the following. 
Corollary 4. Let X be a random vector in satisfying 

E\X\'^ = a^d E\\X\'^a-^ - d\ < LVd sup E X)^ < 1. 

Let Xg denote the projection of X onto the span of 9, for 9 G 211^,^. Fix a > and b < 2 
and suppose that k = ^ i^^^^^g^^)) ^^^^ a < 5 < b. Then there is a c > depending only on a 
and b such that for 

2exp 



there is a subset % C 211^^^ with Prf^^fT] > 1 — C exp (— c'de^), such that for all G T, 

dBL{Xe,cjZ)<C'e. 

Remark: For the bound on KdBL{Xg,aZ) given in [10] to tend to zero as (i — )■ oo, it is 
necessary that k = o(^y\og{d)) , whereas Theorem [3] gives a similar result ii k = S ' — iHlM 



^log(log(d)) 

for S < 2. Moreover, the following example shows that the bound above is best possible in 
our metric. 



1.1. Sharpness. In the presence of log-concavity of the distribution of X, Klartag [9] proved 
a stronger result than Corollary H] above; namely, that the typical total variation distance 
between Xe and the corresponding Gaussian distribution is small even when 9 G Wd,k and 
k = d" (for a specific universal constant e G (0,1)). The result above allows k to grow 
only a bit more slowly than logarithmically with d. However, as the following example 
shows, either the log-concavity or some other additional assumption is necessary; with only 
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the assumptions here, logarithmic-type growth of A; in ci is best possible for the bounded- 
Lipschitz metric. (It should be noted that the specific constants appearing in the results 
above are almost certainly non-optimal.) 

Let X be distributed uniformly among {±v^ei, . . . , iy^e^}, where the Cj are the standard 
basis vectors of W^. That is, X is uniformly distributed on the vertices of a cross-polytope. 
Then E[X] = 0, = d, and given ^ G E(X,0^ = 1; Theorems [H [2] and [3] apply 

with = 1, A = and S = 1. 

Consider a projection of {±v^ei, . . . , ±y/ded} onto a random subspace E of dimension 
k, and define the Lipschitz function / : £^ — )■ M by f{x) := {1 — d{x, Se))^ , where Se 
is the image of {±\/dei, . . . ,±\/ded} under projection onto E and d{x,SE) denotes the 
(Euclidean) distance from the point x to the set Se- Then if nsj^ denotes the probability 
measure putting equal mass at each of the points of S'^;, / fdfisj^ = 1. On the other hand, 
it is classical (see, e.g., [7]) that the volume Uk of the unit ball in M.^ is asymptotically given 

by ^ large k, in the sense that the ratio tends to one as k tends to infinity. 

It follows that the standard Gaussian measure of a ball of radius 1 in R*^ is bounded by 
(27r)'°/^ i^fc ~ [|] ^ . If 7a; denotes the standard Gaussian measure in M'^, then this estimate 

k 

means that / fd-fk < ^ [f] ' • Now, li k = for c> 2, then this bound tends to 

zero, and thus dBhilJ^SE^lk) is close to 1 for any choice of the subspace E] the measures ijlse 
are far from Gaussian in this regime. 

Taken together with Corollary HJ this shows that the phenomenon of typically Gaussian 
marginals persists for k = iog(\og{d)) c < 2, but fails in general if A; = for c > 2. 

Continuing the analogy with Dvoretzky's theorem, it is worth noting here that, for the 
projection formulation of Dvoretzky's theorem (the dual viewpoint to the slicing version 
discussed above), the worst case behavior is achieved for the ii ball, that is, for the convex 
hull of the points considered above. 

1.2. Acknowledgements. The author thanks Mark Meckes for many useful discussions, 
without which this paper may never have been completed. Thanks also to Michel Talagrand, 
who pointed out a simplification in the proof of the main theorem. 

2. Proofs 

Theorems [1] and [2] were proved in [10] , and their proofs will not be reproduced. 

This section is mainly devoted to the proof of Theorem [3l but first some more definitions 
and notation are needed. Firstly, a comment on distance: as is clear from the statement 
of Theorems [2] and [3l the metric on random variables used here is the bounded-Lipschitz 
distance, defined by dsiiX, Y) := supf |E/(X) — E/(y)|, where the supremum is taken over 
functions / with ||/||_bl := niax{||/||oo, < 1 (|/|l is the Lipschitz constant of /). 

A centered stochastic process {Xt}teT indexed by a space T with a metric d is said to 
satisfy a sub-Gaussian increment condition if there is a constant C such that, for all e > 0, 



(1) P 



Xs - XA > el < Cexp ( J- 



A crucial point for the proof of Theorem [3] is that in the presence of a sub-Gaussian 
increment condition, there are powerful tools availabe to bound the expected supremum of 
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a stochastic process; the one used here is the entropy bound of Dudley formulated in 
terms of entropy numbers a la Talagrand [H]. For n > 1, the entropy number en{T,d) is 
defined by 

en{T,d) := inf {sup r„) -.Tn^T, |T„| < 2^"}. 
t 

Dudley's entropy bound is the following. 

Theorem 5 (Dudley). If {Xt}t&T is a centered stochastic process satisfying the sub-Gaussian 
increment condition ([1]) , then there is a constant L such that 



(2) 



E 



sup Xt 



<L^2"/2e„(r,ci). 



n=0 



We now give the proof of the main theorem. 

Proof of Theorem O As in pH] , the key initial step is to view the distance as the supremum 
of a stochastic process: let Xj = Xf{9) := Kxf{Xg) — E/(Xe). Then {Xf}f is a centered 
stochastic process indexed by the unit ball of || ■ \\bl, and dBii^e, ^e) = sup||j|j^^<i 
The fact that Haar measure on Wd,k has a measure-concentration property for Lipschitz 
functions (see [13]) implies that Xf is a sub-Gaussian process, as follows. 

Let / : M'^ — )■ M be Lipschitz with Lipschitz constant L and consider the function G = Gj 
defined on Wd,k by 

G{e^, ...,9k) = ExfiXg) = E[fi{e^,X),..., (4, ^)) |^] • 

Then 



G{e) - G{e'] 



e,9' 



E 

<LE\\{{x,e[-e,),...,{x,e',-ek)) 

< L 



e.e' 



/ -e, 



< Lp{9,e')VB 



thus G{9) is a Lipschitz function on 21?^,^, with Lipschitz constant Ly/B. It follows immedi- 
ately from Theorem 6.6 and remark 6.7.1 of ^13j that 



Fd,k[\G{e)-MG\>e]<^l-e sl^b . 

where Mq is the median of G with respect to Haar measure on Wd^k- It is then a straight- 
forward exercise to show that for some universal constant G, 

(3) r[\G{e)-EG{e)\>e]<Ce-lS^. 

Observe that, for a Haar-distributed random point of Wd,k, EG{Q) = Ef{XQ), and so ([3]) 
can be restated as P [\Xf\ > e] < C exp [— cde^] . 

Note that Xf — Xg = Xf^g, thus for \f — qIl the Lipschitz constant of f — g and ||/ — (/||_bl 
the bounded-Lipschitz norm of f — g, 

F[\Xf-Xg\ >e] <Cexp 





< G exp 


—cde'^ 


W-9 


2 
L. 


[m-gWlLl 
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The process {Xf} therefore satisfies the sub-Gaussian increment condition in the metric 
d*{f,g) := :^||/ — 9\\bl', in particular, the entropy bound apphes. We will not be able 
to apply it directly, but rather use a sequence of approximations to arrive at a bound. 
The first step is to truncate the indexing functions. Let 

{1 \x\ < R, 

R+l-\x\ R<\x\< R+1, 
R+l<\x\, 

and define fn '■= f ■ (pR- It is easy to see that if < 1, then < 2. Since 

\f{x) - fR{x)\ = if xe Br and \f{x) - fR{x)\ < 1 for all x G M^ 

k 

\Exf{Xe)-ExfR{Xe)\<F[\Xe\>R\e] < — (X, e,)'] < — , 

i=l 

and the same holds if is replaced by E. It follows that \Xf — < Consider 
therefore the process Xf indexed by _BL2,r+i (with norm || ■ \\bl), for some choice of R to 
be determined, where 

BL2,R+i := {f-.R^^R: < 2; f{x) = if |x| > i? + l} ; 

what has been shown is that 

2Bk 



(4) 


E 


sup Xf 


< E 


sup Xf 


+ 






MI/IIbl<i ^ 









R^ 



The next step is to approximate functions in BL2,r+i by "piecewise linear" functions. 
Specifically, consider a cubic lattice of edge length e in M.^. Triangulate each cube of the 
lattice into simplices inductively as follows: in M^, add an extra vertex in the center of each 
square to divide the square into four triangles. To triangulate the cube of M'^, first triangulate 
each facet as was described in the previous stage of the induction. Then add a new vertex 
at the center of the cube; connecting it to each of the vertices of each of the facets gives 
a triangulation into simplices. Observe that when this procedure is carried out, each new 
vertex added is on a cubic lattice of edge length |. Let L denote the supplemented lattice 
comprised of the original cubic lattice, together with the additional vertices needed for the 
triangulation. The number of sites of C within the ball of radius i? + 1 is then bounded by, 
e.g., c (^) Uk, where Uk is the volume of the unit ball in M'^. 

Now approximate / G BL2,r+i by the function / defined such that f{x) = f{x) for x E L, 
and the graph of / is determined by taking the convex hull of the vertices of the image 
under / of each /c- dimensional simplex determined by C The resulting function / still has 
< 2, and ||/ — /||oo < since the distance between points in the same simplex 



BL 

is bounded by ey/k. Moreover, = sup^g^ 1/(2^)1 + snPa~y ^'^^^\lZy\^^^ ; where x ~ ?/ if 

x,y E C and x and y are part of the same triangulating simplex. Observe that, for a given 
X E £j, those vertices which are part of a triangulating simplex with x are all contained in a 
cube centered at x of edge length e; the number of such points is thus bounded by S'^, and the 
number of differences which must be considered in order to compute the Lipschitz constant 

of / is therefore bounded by c (^) Uk- Recall that Uk ~ [^] ^ large k, and so the 
number of differences determining the Lipschitz constant of / is bounded by ^ (7^) ' ^'^^ 
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some absolute constants c, c'. It follows that 



(5) 



E 



sup Xf 

/e-BL2,H+i 



< E 



sup X? 

feBL2.R+i ■ 



BL, and that the 



that the process {-^^/j/eSLa ^+1 is sub-Gaussian with respect to n^,^, 
values of / for / G BL2^r+i are determined by a point of the ball 2B^ of i^, where 



(6) 



M 



c'RV 



The virtue of this approximation is that it replaces a sub-Gaussian process indexed by 
a ball in an infinite-dimensional space with one indexed by a ball in a finite-dimensional 

space, where Dudley's bound is finally to be applied. Let T := |/ : / G i?L2,_R+i| C 2B^; 

the covering numbers of the unit ball i? of a finite-dimensional normed space {X, \\ ■ ||) 
of dimension M are known (see Lemma 2.6 of [T3]) to be bounded as 3Nf(i?, || ■ ||,e) < 
exp [M log (f)] • This implies that 

3 



p, e) < exp 



which in turn implies that 



Mlog 



2A^/B 2" 

—^2-M 

Vd 



Applying Theorem [5] now yields 



(7) 



E 



sup 

feBL2,R+i 



Xi 



< 



n>0 



24:^/B Ir.^^ 
2V2 M, 

Vd 



Now, for the terms in the sum with log(M) < (n + 1) log(2) — 31og(n), the summands are 
bounded above by 2^^, contributing only a constant to the upper bound. On the other 
hand, the summand is maximized for 2"" = ^log(2), and is therefore bounded by v^M. 
Taken together, these estimates show that the sum on the right-hand side of ([7]) is bounded 

byLlog(M)y^. 

Putting all the pieces together, 

E 



sup (E[/(Xe)|e] -E/(Xe)) 

II/IIbl<i 

VkB 



9kB r- , , , \MB 



d 



Choosing e = and using the value of M in terms of R yields 

'c'R' 



E 



sup (E[/(Xe)|e] -E/(Xe)) 

ll/l|Bi<l 



lOkB 



kB / A;V4 



kB 



1 2k -)- 1 h I 1 

Now choosing R = cd^^k^^ B^k+4 yields 



E 



sup (E[/(Xe)|e] -E/(Xe)) 

ll/l|i3L<l 



< L 



kB + B \og{d) 

2 2fc+l 2fc+2 • 
(^3fc+4/^3fe+4 _g3fc+4 
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This completes the proof of the first statement of the theorem. The second follows imme- 
diately using that B = 1 and observing that, under the assumption that A < C'\/d, the 

bound above is always worse than the error coming from Theorem [TJ 

□ 

The proof of Corollary H] is essentially immediate from Theorems [2] and |3l 
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